Using NLP to Efficiently Visualize Text Collections with SOMs
نویسندگان
چکیده
Self-Organizing Maps (SOMs) are a good method to cluster and visualize large collections of text documents, but they are computationally expensive. In this paper, we investigate ways to use natural language parsing of the texts to remove unimportant terms from the usual bag-of-words representation, to improve efficiency. We find that reducing the document representation to just the heads of noun and verb phrases does indeed reduce the heavy computational cost without degrading the quality of the map, while more severe reductions which focus on subject and object noun phrases degrade map quality.
منابع مشابه
Natural language processing and information retrieval methods for intellectual property analysis
More intellectual property information is generated now than ever before. The accumulation of intellectual property data, further complicated by this continued increase in production, makes it imperative to develop better methods for archiving and more importantly for accessing this information. Information retrieval (IR) is a standard technique used for efficiently accessing information in suc...
متن کاملExploration of Full-text Databases with Self-organizing Maps
Availability of large full-text document collections in electronic form has created a need for intelligent information retrieval techniques. Especially the expanding World Wide Web presupposes methods for systematic exploration of miscellaneous document collections. In this paper we introduce a new method, the WEBSOM, for this task. Self-Organizing Maps (SOMs) are used to represent documents on...
متن کاملA combined IR/NLP approach to question answering against large text collections
We describe an approach to finding literal answer strings to natural language questions in large text collections. The approach involves linking an IR system with an NLP system that performs reasonably thorough linguistic analysis. The IR system treats the question as a query and returns a set of top ranked documents or passages. The NLP system parses the question and analyses the top ranked do...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملVisualization of Pareto Solutions by Spherical Self-Organizing Map and It’s acceleration on a GPU
In this study, we visualize Pareto-optimum solutions derived from multiple-objective optimization using spherical self-organizing maps (SOMs) that lay out SOM data in three dimensions. There have been a wide range of studies involving plane SOMs where Pareto-optimal solutions are mapped to a plane. However, plane SOMs have an issue that similar data differing in a few specific variables are oft...
متن کامل